
INTERNATIONAL JOURNAL OF TECHNOLOGICAL EXPLORATION AND LEARNING (IJTEL) 

www.ijtel.org 



Implementation of Buffer for Network on Chip 

Router 



Minakshi M. Wanjari, Dr. R. V. Kshirsagar 
Electronics Engineering Department, 
PCE, Nagpur, India 

Abstract — Network-on-Chip (NoC) introduces the design 
methodology of interconnection network into System-on-Chip 
(SoC). It overcomes the main disadvantages of traditional bus- 
based SoC, for example, large delay, small link bandwidth and 
poor scalability, etc. It is widely believed that NoC will replace 
bus-based architecture to become the mainstream of SoC design 
methodology. In NoC architecture the processing elements (PEs) 
communicate with each other by exchanging messages over the 
network and these messages go through buffers in each router. 
Buffers are one of the major resources used by the routers in 
virtual channel flow control. 
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I. INTRODUCTION 

As the feature size is continuously decreasing and 
integration density is increasing, interconnections have become 
a dominating factor in determining the overall quality of a chip. 
Due to the limited scalability of system bus, it cannot meet the 
requirement of current System-on-Chip (SoC) implementations 
where only a limited number of functional units can be 
supported. Long global wires also cause many design 
problems, such as routing congestion, noise coupling, and 
difficult timing closure. Network-on-Chip (NoC) architectures 
have been proposed to be an alternative to solve the above 
problems by using a packet -based communication network [1, 
2,3]. 

In NoC, a router sends packets from a source to a 
destination through several intermediate nodes. If the head of 
packet is blocked during data transmission, the router cannot 
transfer the packet any more. In order to remove the blocking 
problem, the researcher proposed wormhole routing method. 
The wormhole router splits the packet into several flits which 
can be transferred in a single transmission. Buffer allocation 
and flit control are performed at a flit level in wormhole 
routing since wormhole routing does not allocate available 
buffer to whole packet. Therefore, the wormhole routing is a 
method which can minimize overall latency and may decrease 
buffer size compared to others. In addition, virtual channels are 
used to avoid deadlock problem and thus increase throughput. 

Whenever the flit arrives at or departs from router, it 
consumes much dynamic power depending on switch activity. 
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Therefore, buffer design plays an important role in 
implementing an energy efficient on-chip network. 

II. Related work 

For flow control, switching techniques are mechanisms by 
which information is forwarded through the NoC network. 
Switching techniques have a significant influence on the design 
of router micro-architecture, and are broadly classified into 
circuit switching and packet switching, based on the network 
characteristics. Packet switching techniques are the most 
commonly used in current NoC designs [4] . 

Packet switching is further classified as Store and Forward 
(SAF), Wormhole (WH) and Virtual Cut Through (VCT) 
switching. SAF switching requires large buffer size and 
increased latency in the router. In the VCT switching 
mechanism, the buffer requirements are reduced compared to 
the SAF switching. WH switching techniques are prone to 
deadlock when cyclic buffer dependencies develop from the 
topology and routing algorithm of the network. However, all 
switching mechanisms are prone to the Head-on-Line (HoL) 
blocking problem, which results from input buffering 
contention in destination routers. 

To overcome the above problems in router switching 
techniques, researchers have proposed various buffering 
allocation techniques (static and dynamic), micro architectural 
buffer structures, and efficient buffer usage (arbitration) 
algorithms. The most significant improvement to WH 
switching is the introduction of virtual channels (VCs) [5]. J. 
Dally introduced the idea of the virtual channel to develop 
deadlock-free routing algorithms for networks that use WH 
routing [1]. Earlier buffer allocation techniques proposed by 
various researches include: speculative allocation [6]; traffic 
aware VC allocation [7]; advance reservation control of 
resources [8]; buffer size allocation, based on channel 
utilization [9]; and implementing VCs, using asynchronous 
circuit design [10]. Dally and Towels illustrate the basic virtual 
channel router architecture in interconnection networks [11]. 

III. NoC Architecture 

A generic NoC implementation consists of a number of 
Processing Elements (PE) arranged in a mesh-like grid, as 
shown in Figurel. The PEs may be of the same type, e.g. CPU, 
or of different type, e.g. audio cores, video cores, wireless 
transceivers, memory banks etc. Each PE is connected to a 
local router through a Network Interface Controller (NIC); 
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each router is, in turn, connected to adjacent routers forming a 
packet-based on-chip network. The NIC module packetizes 
/de-packetizes the data into/from the underlying 
interconnection network. The PE together with its NIC 
forms a network node. Nodes communicate with each other by 
injecting data packets into the network. The packets traverse 
the network toward their destination, based on various routing 
algorithms and control flow mechanisms. 

The heart of an on-chip network is the router, which 
undertakes the crucial task of steering and coordinating the data 
flow. Performance of Network-on-chip is determined by the 
router architecture to a large extend and virtual-channel router 
is said to be a promising choice for NoC [12]. 

IV. Generic Router Architecture 

In general, the router has 'P' input and 'P' output channels 
(or ports). In most implementations, P = 5; four inputs from the 
four cardinal directions (North, East, South and West) and one 
from the local Processing Element (PE), which is attached to 
the NoC router. To minimize router complexity and traffic 
congestion, NoC routers are usually assumed to connect to a 
single PE. The input/output channels may consist of 
unidirectional links (as shown in Figure 2), bidirectional, or 
even serial links. 

Each router also has five components: Routing 
Computation (RC) Unit, Virtual Channel Allocator (VA), 
Switch Allocator (SA), flit Buffers (BUF), and Crossbar 
S witch. 



Generic NoC Router 




Figure 1 . Generic NoC architecture 
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Figure 2. Generic NoC router architecture. 

When the header flit arrives at the buffer, the RC unit sends 
incoming flits to one of physical channels. The Virtual Channel 
Allocation (VA) unit receives the credit information from the 
neighboring routers, arbitrates all the header flits which access 
the same VCs, and then one of them was selected. Therefore, 
this header flit can set up the path and then send data. The 
transmitting router sends the control information to the 
receiving router, and receiving router may update VC 
information at the internal buffer with this control information. 
SA unit arbitrates the waiting flit in all VCs accessing the 
crossbar and allow only one flit to access crossbar. The SA 
operation is based on the VA stage since the flit data in the 
buffer comes from the previous router in the route. The flit data 
pass over the crossbar and thus can arrive at the destination 
node. 

Buffering within a network router is necessary due to 
congestion, output link contention, and intra-router processing 
delays (e.g. routing computation), which impede data flow. In 
the case of virtual channel-based NoC routers, each input port 
consists of a number of FIFO buffers, with each FIFO 
corresponding to a virtual channel (see Figure 2). Hence, each 
input port has V virtual channels, each of which has a 
dedicated k-flit FIFO buffer (a flit is the smallest unit of flow 
control; one network packet is composed of a number of flits) 
[8]. 

V. Buffer Architecture for Generic NoC Router 

The router buffer design is shown in Figure. 3. Router 
buffers can be implemented as either SRAMs (Static Random 
Access Memory) or as FIFO (First-In-First-Out) shift registers. 
FIFO registers are better suited for power-constrained area- 
efficient NoC architectures [13] as SRAMs require additional 
area for the address decoding logic and involve higher 
switching activity during memory accesses. Hence a FIFO 
implementation is used in NoC architecture as shown in Figure 
3. 

For a router architecture with 'P' ports, 'v' VCs/port and 
'k' flit buffers/VC the total number of buffers/port is z = vk. 
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Virtual Channel 



Figure 3. Buffer architecture for Generic NoC router. 

In buffer architecture the VC identifier of the incoming flit 
allows the DEMUX to switch to the correct input VC. The RP 
(read pointer) and the WP (write pointer) are used to read the 
flit into the buffer and write the flit out to the crossbar. The RP 
points to the next flit to be transmitted and WP points to a null 
pointer indicating an empty flit to write the incoming data. 
When the RP reads a flit out of the buffer, a credit is returned 
to the upstream router to indicate that it can send another flit 
[13]. 

VI. Importance of Virtual Channel Buffer 

The requirement of large buffering space can be solved 
using the wormhole switching method [14]. In the wormhole 
switching method, the packets are split to flow control digits 
(flits) which are snaked along the route in a pipeline fashion. 
Therefore, it does not need to have large buffers for the whole 
packets but has small buffers for a few flits. A header flit build 
the routing path to allow other data flits to traverse in the path. 
The disadvantage of wormhole switching is that the length of 
the path is proportional to the number of flits in the packet. In 
addition, the header flit is blocked by congestion, the whole 
chain of flits are stalled. It also blocked other flits. This is 
called deadlock where network is stalled because all buffers are 
full and circular dependency happens between nodes. The 
concept of virtual channels [14] is introduced to present 
deadlock-free routing in wormhole switching networks. This 
method can split one physical channel into several virtual 
channels. Figure. 4 show the concept of a virtual channel. Since 
most Network-on-Chip systems need less buffering space and 
has a low latency requirement, the wormhole switching method 
with a virtual channel is the most suitable switching method. 
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Figure 4. Concept of a virtual channel. 

VII. Implementation and Results 

Each input port of NoC router has 'v' virtual channels, each 
of which has a dedicated k-flit FIFO buffer. The necessity for 
very low latency dictates the use of a parallel FIFO 
implementation. As opposed to a serial FIFO implementation, 
the parallel flavor eliminates the need for a flit to traverse all 
slots in a pipelined manner before exiting the buffer. The NoC 
router design considered has 4 VCs per input port (i.e. v=4), 
with each VC having 4 flit buffers in the router. So the router 
buffer is four-flit deep (i.e. k= 4) and each flit is 32 bits long. 

The design is implemented in structural Register Transfer 
Level (RTL) Verilog and synthesized using Xilinx ISE Design 
Suite 12.2. The simulation result for FIFO buffer with 4 flits 
deep and each flit of 32 bits long is shown in Figure 5. 
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Figure 5. Simulation result for FIFO Buffer. 



VIII. Conclusion 



Since most Network-on-Chip systems need less buffering 
space and has a low latency requirement, the wormhole 
switching method with a virtual channel is the most suitable 
switching method. Here FIFO buffers are used as virtual 
channels to avoid deadlock problem and thus increase 
throughput. 
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